A new article created using the Distill format.
The U.S. Environmental Protection Agency (EPA) provides access to facility and compliance information for registered permit holders under the Clean Air Act, Clean Water Act. The primary way for non-governmental entities to obtain this data is through the EPA Environmental and Compliance History Online (ECHO) website. Data is housed under “media-specific” programs. Relevant to this post, the National Pollutant Discharge Elimination Systems (NPDES) maintains data on pollutant discharges to waterways and the Air Facility Service (AFS) maintain data on emission to air. Gibbs (Gibbs and Simpson 2009) assess the strengths and weakness of the data collated by ECHO and provide an example of assessing environmental crime rates. While a discussion about the merits of EPA’s environmental data collection efforts and methodology are warranted, this post will discuss a new package that provides API access to the ECHO database.
I primarily use ECHO to obtain discharge monitoring records for wastewater and industrial discharges. Until recently, my workflow was to call or email the state environmental agency and ask for all the available permit numbers in the watershed. Some states maintain and provide a GIS file with spatial locations (this was preferred, but finding out when that file was last updated was can be difficult). Once I obtained the permit numbers, I log onto ECHO and type the permits numbers in, and individually retrieve discharge records for each facility. This requires quite a bit of clicking and typing, and is prone to error. Furthermore, there is no way to verify the records I received are correct. If I mistyped a number or received a wrong record from the agency, I have little way of catching the error.
Thankfully, ECHO provides web access through “GET” and REST services to provide some level of automated and reproducible data access. I recently wrote the echor package to provide access to these service in R. This was my first attempt at developing an R package and my first attempt at utilizing data APIs in a programatic way.